Cyclic Staggered Scheme: A Loop Allocation Policy for DOACROSS Loops

نویسندگان

  • Ali R. Hurson
  • Krishna M. Kavi
  • Joford T. Lim
چکیده

Within the scope of the multithreaded dataflow, the problem of scheduling/allocation of DOACROSS loops has been discussed and it was shown that the so-called staggered allocation offers higher performance and resource utilization than other schemes described in the literature. The staggered scheme, however, produces an unbalanced load among processors. This paper introduces an extension to the staggered scheme—cyclic staggered scheme—that produces a more balanced distribution of iterations among processors. The cyclic staggered scheme is simulated and its performance improvement is analyzed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Effective Execution of Nonuniform DOACROSS Loops

It is extremely difficult to parallelize DOACROSS loops with non-uniform loop-carried dependences. In this paper, we present a static scheduling scheme with an accompanying synchronization strategy that can execute such DOACROSS loops effectively and efficiently. Our approach uses one of the parallelization techniques called Dependence Uniformization, which finds a small set of uniform dependen...

متن کامل

Effects of Parallelism Degree on Run-Time Parallelization of Loops

Due to the overhead for exploiting and managing parallelism, run-time loop parallelization techniques with the aim of maximizing parallelism may not necessarily lead to the best performance. In this paper, we present two parallelization techniques that exploit different degrees of parallelism for loops with dynamic crossiteration dependences. The DOALL approach exploits iterationlevel paralleli...

متن کامل

Redundant Synchronization Elimination for DOACROSS Loops

Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. The problem of redundant synchronization elimination in DOACROSS loops is investigated and an efficient algorithm that identifies redundant synchronizations in multiply-neste...

متن کامل

Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

ÐThis paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads ...

متن کامل

Model-Driven Tile Size Selection for DOACROSS Loops on GPUs

DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make tiling legal and exploit wavefront parallelism across the tiles and within a tile. Thus, tile size selection, which is performance-critical, becomes more complex for DOACROSS loops than DOALL loops on GPUs. This paper pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 47  شماره 

صفحات  -

تاریخ انتشار 1998